Overview

Analysing daily mixes and discover weekly playlists

As an active Spotify user I often contemplate about maintaining an enormous playlist of songs that I like. However, in my opinion, if you really like a song you’ll remember it, so there is no need for a playlist to remind you of what songs you like. That being said, I do need some type of playlist to get my day started. Therefore I listen to my daily mixes almost exclusively. Occasionally I listen to my discover weekly playlist that always surprises me with its quality of recommended songs. Somehow these playlists seem fresh every time while at the same time containing a lot of songs that I know and love.

This made me wonder how these playlists vary from day to day and week to week and how my discover weekly relates to my daily mixes.

Therefore I will be analyzing my daily mixes and discover weekly playlists for the duration of 8 weeks. The interest lies mostly on the relation between the daily mixes and the discover weekly playlist. There is two things I hope to discover:

  1. Is there a way to predict my discover weekly playlist of the next week based on my daily mixes of the current week.

  2. Is there a way to predict which daily mixes I listened to in the previous week(s) based on my discover weekly playlist.

My daily mixes usually range from playlists containing Pixies and the Velvet Underground to playlists containing Miles Davis and Charlie Parker to playlists containing Eagles of death metal and the Black keys to playlists containing Canned heat and Roy Buchannan. Sometimes my daily mixes vary a lot with regard to each other, causing my discover weekly playlist to be a total mess sometimes. This will be very interesting to analyze.

spotify user id of me: kmkov4v2xhms7od6p3gq32wfv.

Below You can see a table of the top 20 tracks that occur most often in my corpus.

# A tibble: 20 x 3
   track.name                         track.artists                n
   <chr>                              <chr>                    <int>
 1 Skating In Central Park            Bill Evans, Jim Hall        24
 2 Don't Call My Name                 Skinshape                   23
 3 Say It (Over And Over Again)       John Coltrane Quartet       23
 4 High Ball Stepper                  Jack White                  21
 5 Level                              The Raconteurs              21
 6 New Fang                           Them Crooked Vultures       21
 7 After Hours - Live                 Jimmy Smith                 20
 8 Can You Get to That                Funkadelic                  20
 9 Goodbye Stranger - 2010 Remastered Supertramp                  20
10 Let My People Go                   Darondo                     20
11 Like It Is - Remastered            Yusef Lateef                20
12 Satan Said Dance                   Clap Your Hands Say Yeah    20
13 Stuck In the Metal                 Eagles Of Death Metal       20
14 Superfly                           Curtis Mayfield             20
15 The Rat                            The Walkmen                 20
16 All The Things You Are             Dizzy Gillespie             19
17 Daddy Blue                         Brad stank                  19
18 Destination                        Felt                        19
19 Green Onions                       Booker T. & the M.G.'s      19
20 Hit It and Quit It                 Funkadelic                  19

Analysis of the weekly playlists

Valence, energy, major/minor and tempo


In terms of energy/valence the two discover weeklies are distributed fairly similar. However Weekly 1 is more scattered whereas weekly 2 can be divided into two groups (seen in next plot). Loudness is pretty much the same among all songs. Both playlists consist of mostly major songs which, honestly, was to my surprise. Something interesting in weekly 2 is that the songs with highest valence and highest energy are songs in a minor key.

Change of energy in songs in the discover weekly playlists

Keys


Weekly 1 seems to be mostly in “A”. This makes sense because the songs in that playlist have a lot more simplicity to them. And A is a pretty “go to” key in my opinion.

Daily v Weekly

Column

Valence and Energy 1

Valence and Energy 2

Valence and Energy 3

Valence and Energy 4

Column

Some Explanation

Weeks 1-3 energy valence mean of DW playlists in the middle of means for daily mixes. Week 4 DW playlist mean outside of the daily mixes

Chroma Features

Chromagraphs for two versions of Paranoid android

Radiohead


Same song. Drastically differently performed. You can see that both versions have a similar structure regarding the chord changes but the changes are at different moments. Also Brad Mehldau really emphasizes one chord whereas the Radiohead version has the chroma magnitude more spread out at each time. A reason for this is probably that the Brad Mehldau version is mostly piano. Also note that the original (Radiohead) version ends the song mostly in A with some magnitude in D E and F as well. Brad Mehldau takes this and turns it around, putting a lot of the energy in D E and F and less in A.

Brad Mehldau

Trying to find a dynamic warp path for these two versions is quite hard


Due to the completely different timings in the song and generally different approach to the song, they are completely different songs according to Dynamic Time Warping. Let’s take a look at another example.

DTW path is even less observable now


Even though when you listen to both songs you can immediately link them (probably due to the lyrics), they are nothing alike due to DTW.

Still, when I listen to these two covers I feel that they are more similar than the two “Paranoid Android” versions which does contain some clearer warping paths (upper right area)

(Side note if you don’t know the Roy Buchanan version and you like the Jimi Hendrix version really give Roy’s version a listen)

Twelve bar blues

Crossroads by Cream

I can’t quit you baby by Led Zeppelin

DTW


I’m pretty sure there is a example that shows two 12 bar blues tracks with a really visible dtw line. But I haven’t found the example yet.

Concluding thoughts

Covers don’t have to sound like eachother. (Will expand in the future)

I Am pretty sure I will be removing this because I do not think this will be interesting for my portfolio. However analysis of chroma might be useful for similarity measures (Turning every DTW into a number by summing all the distances or something to see how similar two songs are chroma wise (very costly though)).

Chroma and Timbre analyses for two completely different songs

Miss Alissa by Eagles of Death metal and Heather by Billy Cobham

So I took a look in terms of valence and energy for two songs that should be polar opposites according to valence and energy and boy is that right.

In the lower left corner we have, with a valence of 0.041, an energy rating of 0.0079 and a length of 8 minutes and 39 seconds, the smooth and shockingly calming “Heather” by Billy Cobham.

In the opposing upper right corner we have, with a valence of 0.877, energy rating of 0.992 and a length of 2 minutes and 38 seconds, you can’t stop dancing to this one, “Miss Alissa” by Eagles of Death Metal.

I analysed differences in chromagrams, cepstograms and self similarity matrices.

Chroma comparisson


Miss Alissa seems to be all over the place in terms of chroma, whilst Heather is more organized and stays close to F for the whole duration of the song. Of course Heather is a lot longer so it leaves a lot more room for organization and build up.

MFCC comparisson


Miss Alissa seems to mostly concentrated in the first cepstral coefficients whilst Heather is concentrated more in the second and third.

First cepstral coefficient referring to loudness seems appropriate because Miss Alissa is quite loud and Heather is quite quiet.

Around 220 seconds the saxophone joins the song Heather which is observable in the cepstogram due to the distribution of magnitude getting more spread out.

Self similarity matrices Timbre


The self similarity matrices in terms of timbre show that Miss Alissa is fairly constant in terms of timbre whilst Heather shows a big change around 220 seconds which is because of the saxophone taking the lead.

Self Similarity matrices Pitch


Now it’s funny to see that the chromagram of Miss Alissa is complete chaos but the self similarity matrix is fairly constant and looks like one block implying that it might be quite chaotic but the chaos is very constant, which is actually a pretty good dexcription for the song.

On the other hand there is Heather that has a very organized chromagram that looks fairly constant but the self similarity matrix in terms of pitches looks a lot more complex than that of Miss Alissa.

Key Estimation

Column

Giant Steps Keydograph

Column

Giant Steps

Giant steps is notorious for its intricate key changes, changing key 10 times throughout the song and changing between B, Eb and G, all major thirds apart.

In the keydograph (?) you can see that it is hard to tell what key the song is in.

The most apparent key in the keydograph is Cmaj which is a semitone above the key the song starts in. Why exactly this is I do not know.

It is possible that due to the melody, that often wanders away from the notes of the current key, different notes get emphasized causing the assumption of a different key. Combining this with the many key changes, this might be why the keys are all over the place.

Conclusion: Sorry mr. Coltrane, I can’t find your keys here.

Tempo estimation

Unfortunate Message

Unfortunately my laptop kept crashing when trying to make tempograms. I am going to look into this because I would like to include them into my portfolio.

Clustering comparisson to daily mixes

Clustering


I thought it might be fun to see if I could cluster a day of songs into 6 clusters which resemble daily mixes. For this reason hierarchical clustering did not seem like the best option for it will be hard to pick 6 clusters with hierarchical clustering because it always branches out in 2. Meaning that I’d have to pick a depth with either 4 clusters or 8 clusters.

Therefore I have used k-means with k=6 to find 6 clusters.

I used energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, time_signature, track.duration_ms and the MFCC’s as features for clustering.

Here you can see a graph of the energy and valence of the daily mixes (top) and the clusters (bottom). It is fairly apparent that valence and energy do not have a lot of say in distinguishing both daily mixes and k-means clusters.

Let’s find the important features.


Training a classifier to determine cluster important features

Classifier Precision and Recall


So I did not know if there was a more efficient way to do this but this seemed suitable. Training a random forest classifier to classify the clusters aquired by k-means clustering will show which features are most important when determining what should be in which cluster.

The precisions seem pretty okay and the importance shows that “mode”,“instrumentalness” and “energy” are very important when assigning songs to clusters.

Let’s take a look.


# A tibble: 6 x 3
  class precision recall
  <fct>     <dbl>  <dbl>
1 1         0.934  1    
2 2         0.926  0.893
3 3         0.970  0.914
4 4         0.933  0.848
5 5         0.929  0.743
6 6         0.898  0.981

Important features

Important features for clusters

Energy and instrumentalness

Mode


Interesting to see that mode almost completely seperates two groups of three clusters while the actual playlists are distributed equally among the modes. This discriminative power of “mode” is probably due to its binary nature. Possibly I should combine key and mode to get a feature that consists of all 24 possibilities of key_modes.

I should also train a classifier to daily mixes to actually find out where the daily mixes make their decisions but it is 30 minutes before the deadline currently.


Estimating listen History from discover weeklies

Unfortunately spotify’s recently played does not let you see past the last 50 recently played songs. This is unfortunate because I could have used this for a classification task to determine what songs I have listened to. It is however possible to do this in another fashion. If we take the daily mixes and make a formula (weekly = daily1A * w1 + daily1B * w2+….) then it should be possible to fit these weights to the weekly mixes.

Discover weekly as a sum of x daily mixes

….

Discover weekly as a sum of 6 clusters for each day

(Calculating MFCC for 10000 songs(2986 unique (jesus I am repetitive)) is going to take a day or two)

….

This is an example. It also shows that energy and valence are not the most important features